conditional gradient algorithm
- Asia > Japan > Honshū > Kansai > Kyoto Prefecture > Kyoto (0.05)
- North America > United States > New York > Tompkins County > Ithaca (0.04)
- North America > United States > California > Los Angeles County > Long Beach (0.04)
- North America > United States > California > Yolo County > Davis (0.04)
- North America > United States > California > San Diego County > San Diego (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- (3 more...)
- Education (0.46)
- Government (0.34)
dea9ddb25cbf2352cf4dec30222a02a5-Paper.pdf
Factorization machines and polynomial networks are supervised polynomial models based on an efficient low-rank decomposition. We extend these models to the multi-output setting, i.e., for learning vector-valued functions, with application to multi-class or multi-task problems. We cast this as the problem of learning a 3-way tensor whose slices share a common basis and propose a convex formulation of that problem. We then develop an efficient conditional gradient algorithm and prove its global convergence, despite the fact that it involves a non-convex basis selection step. On classification tasks, we show that our algorithm achieves excellent accuracy with much sparser models than existing methods. On recommendation system tasks, we show how to combine our algorithm with a reduction from ordinal regression to multi-output classification and show that the resulting algorithm outperforms simple baselines in terms of ranking accuracy.
- Asia > Japan > Honshū > Kansai > Kyoto Prefecture > Kyoto (0.05)
- North America > United States > New York > Tompkins County > Ithaca (0.04)
- North America > United States > California > Los Angeles County > Long Beach (0.04)
Semi-Proximal Mirror-Prox for Nonsmooth Composite Minimization
We propose a new first-order optimization algorithm to solve high-dimensional non-smooth composite minimization problems. Typical examples of such problems have an objective that decomposes into a non-smooth empirical risk part and a non-smooth regularization penalty. The proposed algorithm, called Semi-Proximal Mirror-Prox, leverages the saddle point representation of one part of the objective while handling the other part of the objective via linear minimization over the domain. The algorithm stands in contrast with more classical proximal gradient algorithms with smoothing, which require the computation of proximal operators at each iteration and can therefore be impractical for high-dimensional problems. We establish the theoretical convergence rate of Semi-Proximal Mirror-Prox, which exhibits the optimal complexity bounds, i.e.
Conditional Gradients for the Approximate Vanishing Ideal
Wirth, Elias, Pokutta, Sebastian
The vanishing ideal of a set of points $X\subseteq \mathbb{R}^n$ is the set of polynomials that evaluate to $0$ over all points $\mathbf{x} \in X$ and admits an efficient representation by a finite set of polynomials called generators. To accommodate the noise in the data set, we introduce the pairwise conditional gradients approximate vanishing ideal algorithm (PCGAVI) that constructs a set of generators of the approximate vanishing ideal. The constructed generators capture polynomial structures in data and give rise to a feature map that can, for example, be used in combination with a linear classifier for supervised learning. In PCGAVI, we construct the set of generators by solving constrained convex optimization problems with the pairwise conditional gradients algorithm. Thus, PCGAVI not only constructs few but also sparse generators, making the corresponding feature transformation robust and compact. Furthermore, we derive several learning guarantees for PCGAVI that make the algorithm theoretically better motivated than related generator-constructing methods.
Constrained Stochastic Nonconvex Optimization with State-dependent Markov Data
Roy, Abhishek, Balasubramanian, Krishnakumar, Ghadimi, Saeed
We study stochastic optimization algorithms for constrained nonconvex stochastic optimization problems with Markovian data. In particular, we focus on the case when the transition kernel of the Markov chain is state-dependent. Such stochastic optimization problems arise in various machine learning problems including strategic classification and reinforcement learning. For this problem, we study both projection-based and projection-free algorithms. In both cases, we establish that the number of calls to the stochastic first-order oracle to obtain an appropriately defined $\epsilon$-stationary point is of the order $\mathcal{O}(1/\epsilon^{2.5})$. In the projection-free setting we additionally establish that the number of calls to the linear minimization oracle is of order $\mathcal{O}(1/\epsilon^{5.5})$. We also empirically demonstrate the performance of our algorithm on the problem of strategic classification with neural networks.
- North America > United States > California > Yolo County > Davis (0.04)
- North America > United States > California > San Diego County > San Diego (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Asia > Middle East > Jordan (0.04)
Second-order Conditional Gradients
Carderera, Alejandro, Pokutta, Sebastian
An immensely powerful approach when X R n is to construct a second-order approximation to f(x) at the current iterate using first and second order information, denoted by ˆf(x), and move in the direction that minimizes this approximation, giving rise to a family of methods known as Newton methods (Kantorovich, 1948). A damped variant of the former applied to the minimization of a self-concordant function, converges globally, and shows quadratic local convergence when the iterates are close enough to the optimum (Nesterov & Nemirovskii, 1994). The global convergence of this method also extends to strongly convex and smooth function (Nesterov & Nemirovskii, 1994; Nesterov, 2013). Using a cubic regularized version of Newton's method, the global convergence of the method can also be extended to a broader class of functions than that of self-concordant or strongly convex and smooth functions (Nesterov & Polyak, 2006). When X R n is a convex set, one can use a constrained analog of these methods (Levitin & Polyak, 1966), where a quadratic approximation to the function is minimized over X at each iteration.
- Europe > Russia (0.04)
- Asia > Russia (0.04)
- North America > United States > Massachusetts (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.67)
Multi-output Polynomial Networks and Factorization Machines
Blondel, Mathieu, Niculae, Vlad, Otsuka, Takuma, Ueda, Naonori
Factorization machines and polynomial networks are supervised polynomial models based on an efficient low-rank decomposition. We extend these models to the multi-output setting, i.e., for learning vector-valued functions, with application to multi-class or multi-task problems. We cast this as the problem of learning a 3-way tensor whose slices share a common basis and propose a convex formulation of that problem. We then develop an efficient conditional gradient algorithm and prove its global convergence, despite the fact that it involves a non-convex basis selection step. On classification tasks, we show that our algorithm achieves excellent accuracy with much sparser models than existing methods. On recommendation system tasks, we show how to combine our algorithm with a reduction from ordinal regression to multi-output classification and show that the resulting algorithm outperforms simple baselines in terms of ranking accuracy.
- Asia > Japan > Honshū > Kansai > Kyoto Prefecture > Kyoto (0.05)
- North America > United States > Texas > Irion County (0.04)
- North America > United States > New York > Tompkins County > Ithaca (0.04)
- North America > United States > California > Los Angeles County > Long Beach (0.04)
Multi-output Polynomial Networks and Factorization Machines
Blondel, Mathieu, Niculae, Vlad, Otsuka, Takuma, Ueda, Naonori
Factorization machines and polynomial networks are supervised polynomial models based on an efficient low-rank decomposition. We extend these models to the multi-output setting, i.e., for learning vector-valued functions, with application to multi-class or multi-task problems. We cast this as the problem of learning a 3-way tensor whose slices share a common basis and propose a convex formulation of that problem. We then develop an efficient conditional gradient algorithm and prove its global convergence, despite the fact that it involves a non-convex basis selection step. On classification tasks, we show that our algorithm achieves excellent accuracy with much sparser models than existing methods. On recommendation system tasks, we show how to combine our algorithm with a reduction from ordinal regression to multi-output classification and show that the resulting algorithm outperforms simple baselines in terms of ranking accuracy.
- Asia > Japan > Honshū > Kansai > Kyoto Prefecture > Kyoto (0.04)
- North America > United States > New York > Tompkins County > Ithaca (0.04)
- North America > United States > California > Los Angeles County > Long Beach (0.04)